An exact algorithm to find a maximum weight clique in a weighted undirected graph

We introduce a new algorithm MaxCliqueWeight for identifying a maximum weight clique in a weighted graph, and its variant MaxCliqueDynWeight with dynamically varying bounds. This algorithm uses an efficient branch-and-bound approach with a new weighted graph coloring algorithm that efficiently determines upper weight bounds for a maximum weighted clique in a graph. We evaluate our algorithm on random weighted graphs with node counts up to 10,000 and on standard DIMACS benchmark graphs used in a variety of research areas. Our findings reveal a remarkable improvement in computational speed when compared to existing algorithms, particularly evident in the case of high-density random graphs and DIMACS graphs, where our newly developed algorithm outperforms existing alternatives by several orders of magnitude. The newly developed algorithm and its variant are freely available to the broader research community at http://insilab.org/maxcliqueweight, paving the way for transformative applications in various research areas, including drug discovery.


Graph notations
Table 1 gives an overview of the variables used in the developed MaxCliqueWeight algorithm and its MaxClique-DynWeight variant together with their meaning.

A new algorithm to find a maximum weight clique in a weighted undirected graph
We describe a new algorithm MaxCliqueWeight, in which we have extended the basic algorithm, referred to as MaxClique, developed in 16 .The new MaxCliqueWeight algorithm works on vertex-weighted graphs, that is, graphs where each vertex is assigned a weight represented by a positive integer number.The new algorithm finds a maximum weight clique, i.e., a clique with the highest total sum of its weights, in an undirected vertexweighted graph, as shown in Table 2.
The algorithm recursively explores the search tree of the possible weighted cliques, using a branch-and-bound technique to efficiently prune parts of the search space that cannot contain a maximum weight clique.It consists of the new ColorSortWeight procedure (see Table 3) that provides the upper bound to the weight of the clique that can be found at each step of the search tree, and of the Expand procedure (see Table 4), the recursive procedure that performs the branch-and-bound search.

Input
The algorithm takes as input a weighted graph G represented by a set of vertices R, a set of adjacent vertices Γ(v) for each vertex v ∈ R, and a set of weights W, where each vertex v ∈ R has assigned a weight w, which is a positive number (w > 0).

Output
The set Qmax consisting of vertices of the maximum weight clique found in the input vertex-weighted graph.1.
32. j = j + 1 Vol:.(1234567890)  2) are sorted by their degrees in decreasing order, so that the first vertex has the highest degree of all vertices in R, and the last one has the lowest degree (see line 1 in Table 2).This order of vertices produces the tightest upper bounds to the size of a maximum weight clique in our experiments (see also 16 ).The ColorSortWeight procedure is then called once on the input vertices in the set R (line 2 in Table 2).This procedure efficiently determines the initial upper bound to the weight of a clique (wcolor) for each vertex v∈R if v is selected to form the growing clique.

ColorSortWeight procedure: determination of upper bounds to clique weight
The ColorSortWeight procedure takes a set of vertices R as input and partitions these vertices into color classes C, where vertices in the same color class C[k] are not connected by an edge as shown in Table 3.Here, k represents the color of all vertices in color class C[k].For each vertex p ∈ R, the procedure determines the lowest color k such that no vertex in the k-th color class C[k] is adjacent to p (line 8 in Table 3).If k is greater than the maximum number of colors seen so far represented by the variable maxno (line 10 in Table 3), a new color class is created.Vertex p is then inserted into this color class k and its color k is assigned to it, i.e., color(p) = k.At each step, the weight of color class C[k], initially set to zero, is updated to the weight of the vertex p if the weight of p is larger than the weight of its color class (line 14 in Table 3).This results in each color class C[k] being assigned the maximum weight of its vertices.Thus, any current clique Q consisting of vertices in the set R that is found will have at most k vertices and its weight will be less or equal the sum of the maximum weights of the color classes 1 through k, i.e., weight(Q) ≤ Σ n=1..k (weight(C[n])).This condition holds after line 16 in Table 3.
In the next step, the ColorSortWeight procedure determines the color class (min_k_w) below which vertices cannot be used to extend the growing weighted clique based on the difference between the weight of the currently growing clique Q and the weight of the maximum weight clique found so far Qmax (line 18 in Table 3).The value of min_k_w is found iteratively by starting with min_k_w = 1 and increasing it to maxno; the search is stopped when the sum of the weights of the k smallest color classes is greater than the difference between the weight of the maximum clique found so far Qmax and the weight of the currently growing clique Q.
The vertices in the set R with colors less than min_k_w are not going to be used to extend the current clique, therefore they can be kept in the initial ordering.They are copied to the beginning of R in their initial order of decreasing degrees (lines 21-24 in Table 3).Maintaining the initial order of vertices was shown to produce tighter upper bounds than if order is not maintained 19 and improves efficiency of clique search (see 16 ).On the other hand, vertices in color classes with k greater or equal to min_k_w can form cliques with weights higher than the weight of the maximum weight clique Qmax found so far.These vertices are copied from their respective color classes C, starting from C[min_k_w] and ending with C[maxno], back to the set R in the order in which they appear in each color class (lines 27-32 in Table 3).Each such vertex with color of k is assigned a cumulative weight (wcolor), which is the sum of maximum weights of all color classes 1 to k (see the calculation of ub_k on line 28 in Table 3).A wcolor represents the upper bound to the weight of the clique that can be found in the Expand procedure; wcolor is used for prunning the search tree, which is described next.
Expand procedure: finding a maximum weight clique After the upper bounds of vertices in R are set, the Expand procedure is called.This call is done once during the Initialization phase (see Table 2) as well as at each step during the recursive branch-and bound tree search (line 11 in Table 4) as explained in the following.The Expand procedure explores the search tree of possible weighted cliques, starting with the initially empty set Q representing a set of vertices of the currently growing weighted clique.At each step, Expand selects the vertex p ∈ R with the highest cumulative weight (wcolor) (line 3 in Table 4), which is the last vertex in set R, and removes this vertex p from the set R. If the weight of Q, which is the sum of the weights of all its constituting vertices plus the weight of the selected vertex p is greater than the weight of Qmax, the vertex p is added to Q (line 4 in Table 4).
The subset of vertices Rp ⊂ R, in which each vertex is adjacent to p, is determined (line 6 in Table 4), and if this set Rp is not empty, the ColorSortWeight procedure is called with Rp as an argument.This sets the upper bounds (wcolors) for vertices in set Rp.The Expand procedure is then called recursively with Rp as argument.The recursive calls continue until Rp is empty.If Rp is empty, and the weight of Q is greater than the weight of Qmax (line 12 in Table 4), Qmax is updated to be Q.In any case, if the weight of the candidate clique is greater than Qmax or not, the Expand backtracks removing the added vertex from Q to allow the search along a different branch of the search tree.The result of the Expand procedure is a set Qmax containing vertices of a maximum weight clique that was found in the input vertex-weighted graph.

Dynamically varying bounds for greater efficiency
In the MaxCliqueWeight algorithm the calculation of the degrees and sorting of vertices is performed only once with the initial set of vertices R (see line 1 in Table 2).In 16 , we developed a new technique of varying upper bounds that recalculates the degrees of vertices in Rp in the graph induced by these vertices, i.e., G(Rp), at heuristically determined steps of the Expand procedure.Vertices in Rp are then sorted in a decreasing order with respect to their degrees in G(Rp).The graph coloring algorithm thus considers vertices in Rp sorted by their degrees in the induced graph G(Rp) rather than in G, which increases their tightness.Varying upper bounds enables to reduce the number of steps required to find the maximum clique and improve the run time of the algorithm by as much as an order of magnitude on dense graphs, while preserving its superior performance on sparse graphs 16 .
However, the calculation of degrees is computationally expensive (O(|Rp| 2 ), therefore we need to determine at which steps this should be performed to decrease the overall running time of the maximum clique search.The heuristic by which we determine the steps where the recalculation of degrees in G(Rp) and resorting of vertices is performed assumes that the calculation time is improved only when the candidate set Rp is large.Obviously, set Rp is larger on initial (lower) levels of the recursion of the Expand procedure than on the higher levels.With the recursion level we denote the number of recursive calls of the Expand procedure from the first call to the current branch.For large candidate sets the computational expense related to the computation of tighter bound is much smaller than the cost of investigating false solutions, which arise when applying less tight bounds.
Therefore, we count the number of steps up to and including each level of the recursion in the Expand procedure and also the number of all steps completed so far.Using these two values, we calculate T[level], which is the fraction of steps up to the current level among all the steps completed so far (see line 8 in Table 4).With a new heuristic parameter, Tlimit, we can then limit the use of tighter bounds (recalculation of degrees) to certain levels.While T[level] is less than Tlimit, we perform the calculations of the degrees and sorting and in the Color-Sort algorithm we consider vertices in Rp sorted by their degrees in G(Rp).The Tlimit parameter is set to 0.025 by default 16 , which limits the calculation of degrees to the lower levels of the recursion where Rp is the largest.

Maximum weight clique search on an example graph
To introduce the novel MaxCliqueWeight algorithm, we illustrate its operation through a step-by-step walkthrough on a sample weighted graph.This graph comprises seven vertices, labeled one through seven and depicted as circles.Each vertex is associated with a positive integer weight, denoted within parentheses, as shown in Figure 1.
In Step 1, we initiate the Expand procedure for the first time (see line 3 in Table 2).At this point, the set R contains vertices of the input graph, sorted by their degrees in descending order.The first vertex in R (vertex no.4) has five degrees (number of connected edges), while all other vertices have four degrees.The set wcolors, representing the upper bounds for clique weights in the input graph, has been initialized with the ColorSortWeight procedure (as seen in Table 3).Additionally, sets Q and Qmax are empty at this stage.
In Step 2, which marks the next level of the recursion of the Expand procedure, the last vertex in R (vertex no. 7, colored blue in Figure 1) is selected.The condition on line 4 in Table 4, known as the weight bound condition, is satisfied.This condition checks if the upper bound (wcolor) of vertex no. 7, which is wcolor(7) = 21, plus the weight of Q (0) is greater than the weight of Qmax (0).In this case, this condition is met, and vertex no.7 is added to the set Q, representing the growing clique.Simultaneously, the set R is reduced to contain only the vertices adjacent to vertex no. 7.
In Step 3 and Step 4, the last vertices in R, vertex no.6 and vertex no. 5, respectively, are added to the set Q.In Step 5, vertex no. 4 is added.Since Rp is empty, indicating that the clique in Q cannot expand further, the set Q is copied into Qmax.
Moving to Step 6, the Expand branch-and-bound procedure backtracks to the recursion level of Step 1.It then begins constructing a new clique Q by selecting the next-to-last vertex in R (vertex no. 6).The weight bound condition on line 4 in Table 4 is satisfied, as the wcolor of vertex no.6, which is 20, plus the weight of Q (0) is greater than the weight of Qmax (18).Therefore, vertex no.6 is added to the set Q.However, all the vertices in the current set R (vertices no. 4, 5, and 1) have wcolors equal to zero.Consequently, none of these vertices can extend Q's weight beyond that of Qmax, i.e., the weight of Q (2) plus the wcolor of vertex no. 4, 5, or 1 (0) is less than the weight of Qmax (18).
In Step 7, the Expand procedure backtracks to the recursion level of Step 1, and vertex no. 3 is added to the set Q.
In Step 8 vertex no. 1 is added to the set Q, as the weight of Q ( 5), which now includes vertex no. 3, plus the wcolor of vertex no. 1 (15), exceeds the weight of Qmax (18).Vertex no. 2 is then considered for addition but is excluded since the weight of Q (7), including vertices no. 3 and 1, plus the wcolor of vertex no. 2 (0), is less than the weight of Qmax (18).Consequently, the algorithm backtracks to the recursion level of Step 7 and attempts to add another vertex.
In Step 9, vertex no. 5 is added to the set Q because the weight bound condition is met.The weight of the set Q (5) plus the wcolor of vertex no. 5 (15) is greater than the weight of Qmax (18).The algorithm then attempts to extend the clique in Q with the next-to-last vertex in R, vertex no. 2, but this fails.This time, the reason for not adding the vertex is the condition on line 12 in Table 4.This condition checks if weight(Q) > weight(Qmax), which is not met as weight(Q) = 13 < weight(Qmax) = 18.
Finally, in Step 10, vertex no. 4 is added to the set Q, as the weight of Q ( 13) plus the wcolor of vertex no. 4 (7) is greater than the weight of Qmax (18).The weight of the new clique in Q is now 20.Consequently, Qmax is updated to be Q, and Qmax now holds the maximum weight clique (marked red in Figure 1), which is the result of this example calculation.

Results and discussion
To evaluate the developed maximum weight clique algorithm MaxCliqueWeight and its dynamic variant Max-CliqueDynWeight we have tested them on the test set of random weighted graphs as well as on the test set of DIMACS graphs.We have compared our algorithms to the Cliquer algorithm 8,18 for finding a maximum weight clique, which is widely used and well established in the research community.We did not investigate some graph size and edge density combinations and did not consider certain DIMACS graphs due to the expected long computation times.

Random weighted graphs
The first test set consists of 35 random weighted graphs with 100, 200, 300, 500, 700, 1000, 5000, and 10000 vertices, where we refer to the number of vertices as the size of a graph.Edge densities within each graph size category take discrete values of 0.1, 0.3, 0.5, 0.7, 0.8, 0.9, 0.95, 0.99, where edge density of a graph equals to the probability p of an edge between two graph vertices.For each graph of a given size and for each pair of vertices vi and vj within this graph, if a uniform continuous distribution u(vi, vj) for random numbers on the range [0, 1), with equal probability throughout the range, was less than a probability p, then we connected the vertices vi and vj by an edge.
To each vertex v we assigned a weight, which is a positive integer number.The weight assignment is done as follows.For each vertex v we generated a real value random number according to the normal random number distribution defined as: f(x;μ,σ) = 1/(σ•sqrt(2π))e^(− 1/2•((x − μ)/σ)^2), where μ is the mean and σ is the standard deviation; in our experiments, we set μ to 1,000,000 and σ to 200,000.We checked that the resulting random numbers were positive, i.e., > 0. We then rounded the resulting values to the nearest integer values.In this way, we obtained a random positive integer number, a weight, for each vertex v in a graph.

Benchmark graphs from the DIMACS challenge
The second test set consists of 57 graphs from the second DIMACS implementation challenge available at http:// dimacs.rutge rs.edu 17 .The DIMACS implementation challenges help understand and improve the practical performance of algorithms for important problems, particularly those that are hard in the theoretical sense.It is a publicly available collection of benchmark graphs for testing algorithms, such as the maximum clique, graph coloring, and satisfiability algorithms, which are all NP hard problems.Since the DIMACS graphs are unweighted graphs, we assigned vertex weights to them using the same procedure as described for random graphs in the second paragraph of section for generation of random weighted graphs.

Random weighted graphs
The results for random weighted graphs are in Table 5.The best performing algorithm is the MaxCliqueDyn-Weight algorithm, which achieves up to three orders of magnitude speedup compared to the Cliquer algorithm 8,18 on difficult to solve graph instances characterized by their high edge densities.For example, see random graph with 100 vertices and edge density of 0.95, where MaxCliqueDynWeight achieves 6100x speedup.As the edge density approaches 1.0, our algorithm takes less time, e.g., see graphs with 100 and 200 vertices and edge densities of 0.9 and 0.99 in Table 5.This can be explained by the fact that the closer we get to the density of 1.0, the easier the maximum clique problem becomes, since a graph with an edge density of 1.0 is by definition a maximum clique.
Generally, our MaxCliqueWeight and MaxCliqueDynWeight algorithms perform best on dense graphs with edge densitites 0.5-0.99,while they also still retain good performance on sparser graphs.The large speedup of our algorithms is most likely due to the use of efficient upper bound computation based on graph coloring, which allows us to efficiently prune large parts of the search tree compared to the Cliquer algorithm, which has no such upper bounds.

DIMACS benchmark graphs
Next, we tested the developed maximum weight clique algorithms on DIMACS graphs, used in standard benchmarking of maximum clique algorithms.We added random vertex weights to DIMACS graphs as described.The results are in Table 6.
The clear winner among the three algorithms tested is the MaxCliqueDynWeight algorithm.It is especially well suited for denser graphs with edge densities 0.5-0.99,where it achieves speedups of several orders of magnitude compared to Cliquer algorithm.For examples, see »san« and »sanr« graphs in Table 6.On the other hand, for sparse graphs with edge densities 0.1-0.3,which are easier to solve, characterized by the calculation times typically under a second, our algorithm still achieves comparable speeds to the Cliquer algorithm.

Effect of ordering of graph vertices on algorithms' performances
To test the effect of the order of input graph vertices on algorithms' performances, every calculation on random and on DIMACS weighted graphs was repeated 100 times.Each time the input graph vertices were randomly shuffled.This was done before the initialization steps shown in Table 2.The shuffling only changed the position of vertices (vertex numbers) in each input graph, and not the structure of each graph.
We observe that for larger and denser graphs, the effect of initial order on calculation time can be large, e.g., see standard deviations for random graph with 300 vertices and edge density of 0.99 in Table 5, or the MANN_a27 graph with 378 vertices and edge density of 0.99 in Table 6.In these cases, measuring the algorithms' speed with only one ordering of graph vertices could easily result in wrong assesment of this algorithm's performance.If we tested with one particular graph vertices ordering for MANN_a27 graph, we could wrongly assign MaxClique-DynWeight algorithm as the fastest algorithm, while the averaged calculation time shows that MaxCliqueWeight is the winner in this case.
Table 5. Calculation times on random weighted graphs for the MaxCliqueWeight and MaxCliqueDynWeight algorithms compared to the Cliquer algorithm.The fastest calculation times and best speedups in each row are in bold.Calculation times are averaged over 100 runs, where each run was performed with randomly shuffled graph vertices as input.Calculation times that were < 1 ms were set to 1 ms, while those exceeding 2 h were set to 2 h.*Speedups are calculated by dividing the Cliquer algorithm's calculation time with the MaxCliqueWeight's or MaxCliqueDynWeight's calculation time.6) and the random graph with a size of 300 and an edge density of 0.99 (see Table 5), the high standard deviations remain high even if the number of repetitions is increased from ten to one hundred.This indicates that the high standard deviations in these cases are most likely due to the dependence of our algorithms on the initial order of the vertices and not due to a low number of repetitions.
Shuffling graph vertices and multiple repetitions are not common when testing clique algorithms, but can significantly affect the measurements.Shuffling of graph vertices has already been proposed for this reason 20 .Therefore, we recommend testing all maximum weight clique algorithms and other graph algorithms with different initial orders of graph vertices to assess the effects of the different ordering on computational efficiency.
To test whether these results are representative for graphs of the given size and density or whether they are due to idiosyncratic properties of the graphs, we performed additional experiments for random and DIMACS graphs.For each random graph size and density, we generated 100 new random graphs so that we randomized the edges and weights.For each DIMACS graph, we only applied the generation of random weights 100 times to each graph type, since we could not randomize the edges of these graphs as this would destroy their inherent structures.The average calculation times of the tested algorithms for random and DIMACS weighted graphs, respectively, can be found in Supplementary Tables S1 and S2.
The calculation times and speedups obtained with this new method follow a very similar trend to those we observed with random vertex shuffling, which can be seen by comparing Table 5 with Supplementary Table S1 and Table 6 with Supplementary Table S2.The only exception is the san400_0.9_1graph, which is solved much faster by our algorithm when randomizing vertex order than when randomizing vertex weights.We have previously observed that the initial order of vertices has a particular impact on the performance of maximum clique algorithms for this particular graph 16 .

Conclusions
In this work, we describe a new maximum weight clique algorithm MaxCliqueWeight and its variant MaxClique-DynWeight algorithm.The MaxCliqueDynWeight algorithm proves to be the faster of the two algorithms for most random and DIMACS weighted graphs, and is significantly faster than the widely used Cliquer algorithm for dense graphs.Our algorithm is particularly well suited for dense graphs with edge densities 0.5-0.99,but it retains most of its speed for sparser graphs as well.The developed algorithm finds diverse applications across various domains, such as drug discovery and bioinformatics.It holds the potential to significantly accelerate the development of novel drugs through computer-based algorithms.

Table 1 .Table 2 .
Graph variables and definitions used in the clique algorithms.Name Definitions G An input graph G represented by a set of vertices R and a set of adjacent vertices Γ(v) for each vertex v ∈ R R A set of input graph vertices represented by numbers 0…n Γ(v) A set of vertices adjacent (connected by an edge) to each vertex v ∈ R in the input graph G Δ(R) Maximum degree of any vertex in R C A set of color classes C[k], k = 1…|R|+ 1, where each contains non-adjacent vertices only; k represents the color of all vertices in the k-th color class Q A global variable, a set of graph vertices that represent a (weighted) clique being constructed W A set of weights for graph vertices of the input graph (e.g., W[R[i]] denotes the weight of vertex with index i in R) Qmax A global variable, a set of graph vertices that represent a maximum weight clique The initialization steps of the new algorithms MaxCliqueWeight and MaxCliqueDynWeight.Sort vertices in R by decreasing degrees (from the highest to the lowest degree) 2 ColorSortWeight (R) Set initial colors and wcolors of the input vertices in set R. These are the upper bounds to the size and the weight of a maximum clique, respectively 3 Expand (R) Start the recursive branch-and-bound tree search for (weight) cliques while min_k_w < maxno and Σk=1..min_k_w(weight(C[k])) ≤ weight(Qmax) -weight(Q)

Figure 1 .
Figure 1.Example calculation with the MaxCliqueWeight algorithm on a weighted graph.Each vertex is represented as a circle and has a vertex number and below it is in parentheses the vertex's weight.Vertices belonging to the current clique Q are colored in blue.Maximum weight clique is shown with red thick edges.In each step, below the graph are the current values of important variables as they are sampled at the beginning of the Expand procedure (before line 2 in Table4) for steps 1-4 and 6-9 or at the update of Qmax in the Expand procedure (after line 13 in Table4) for steps 5 and 10: R-graph vertices that remain to be explored; wcolorsupper bounds to the weight of a clique, where each wcolor corresponds to the vertex in R at the same index in the list; C-color classes determined in the ColorSortWeight procedure, where each color class corresponds to the vertex in R at the same index in the list; Q-vertices of the current growing clique; Qmax-vertices of the current maximum weight clique.

Table 3
. A new approximate coloring algorithm for weighted undirected graphs used in our new maximum weight clique algorithm MaxCliqueWeight and its variant MaxCliqueDynWeight.Global variables are: Q -a set that contains the currently growing weight clique, Qmax -a set that contains the highest weight clique currently found.For definitions of all variables see Table

Table 4 .
Expand procedure for the new maximum weight clique algorithm MaxCliqueWeight and its variant MaxCliqueDynWeight.Global variables are the set Q that contains the currently growing clique and the set Qmax which contains the largest clique currently found.For definitions of all variables see Table1.

Table 6 .
Calculation times on weighted DIMACS graphs for the MaxCliqueWeight and MaxCliqueDynWeight algorithms compared to the Cliquer algorithm.The fastest calculation times and best speedups in each row are in bold.Calculation times are averaged over 100 runs, each run performed with randomly shuffled graph vertices.Calculation times that were < 1 ms were set to 1 ms, while those exceeding 2 h were set to 2 h.*Speedups are calculated by dividing the Cliquer algorithm's calculation time with the MaxCliqueWeight's or MaxCliqueDynWeight's calculation time.